MATCHING BIOLOGY CONCEPTS AND STATISTICS

TREE RING ANALYSIS OVER TIME

VASCULAR PLANT DIVERSITY VARIATION ALONG SITES AND SOIL TEMPERATURE

INTRODUCTION

An interesting concept in biology, and more in forestry, is the ring dynamics of trees over time. Dendroclimatology studies and uses the growth ring patterns to reconstruct past variations in climate (Fritts. 1987). Since well-defined annual-growth rings can be observed in the wood (rings) from many species of temperate forest trees throughout the world, in certain circumstances, these growth rings contain useful information about varying environmental conditions affecting their growth like temperature changes and humidity as well as tree features (age and size), depending on the species and latitude for what other data analysis (climate data) should be included (Tumajer, J., & Lehejček, J. 2019).

Another key concept in biology is Diversity. If you have wondered what is the connection between species richness of plants versus space and environment, this is the right place to achieve basic biological and statistical concepts. There are over 352 000 (391 000 according to Jin and Qian, 2019) species of vascular plants in the world. More than 95% of vascular plants are flowering plants, also called angiosperms (e.g. grasses, orchids, maple trees). The other types of vascular plants are gymnosperms (cone-bearing trees, e.g. pine trees, spruce trees) and seedless plants (e.g. ferns, horsetails) (see fugure of vascular plants below) . 5111 species of vascular plants have been found in Canada(CESCC, 2010). Such an amazing quantity of types and forms of life definitely invite biologists to wander them selves how diversity works in nature.

OBJECTIFS

At the end of this tutorial you will be able to explore patterns of rings over time. Perhaps, something hidden is there! You will also to be able to understand why biology and statistics do a nice match. You will be able to have the basis of linear regression applied to species richness in function of at least of a couple of independent variables.

How BIG is Canada’s Boreal Forest? A reason to understand vascular plants diversity

UNDERSTANDING VASCULAR PLANT DIVERSITY: STUDY CASE

Alberta, including 660 000 km2, is a diverse Canadian province. Almost 2000 species of vascular plants have been recorded (almost 1500 native) (Packer and Gould, 2017). An interesting project monitoring diversity was the Seasonal and annual dynamics of western Canadian boreal forest plant communities: a legacy dataset spanning four decades. The primary purpose of the Seasonal Dynamics (SEADYN) and later Annual Dynamics (ANNDYN) research projects was to document seasonal changes in the vegetative composition during the snow-free season (May through October) and longer-term changes in vegetation and forest mensuration for boreal forest stands in Alberta, Canada dominated by Pinus banksiana (Lamb.) (see central image in the below figure).

Two regions were used for this study: one in the Hondo-Slave Lake (hereafter, Hondo) region of Alberta, which was surveyed from 1980 to 2015, and a second location in the Athabasca Oil Sands (hereafter, AOS) region in northeastern Alberta, which was surveyed from 1981 to 1984 and thought to have substantial atmospheric pollution due to regional industrial development (oil sands mining and processing). In order to reveal how biodiversity connects with space and at least one environmental variable, we will focus on understanding the effect of stands and soil temperature in species richness in 2010 regarding only Hondo stands of this project.Hondo stands are north of Edmonton and east of Lesser Slave Lake, Alberta (AB), Canada (bottom right map panel). 2010 Hondo vascular plant is composed by 131 species. In this sites the maximum number of species found between 1980-2015 was 215.

EXPERIMENTAL DESIGN

The experimental design consisted of plots of 50x50 m subdivided into 50 5x5 m quadrants. Data coming from Hondo monitoring can allow us to state tree questions concerning soil temperature and stands.

TREE RING ANALYSIS OVER TIME

Let’s plot some graphics. We can plot the average ring width (mm) in axe y in function of time (year) in axe x (see red line). But, we can also plot the average ring width (mm) in axe y in function of time (year) in axe x, simultaneously considering the stands (see gray lines). Do you have some ideas about what is happening whit these trends?

BINGO!!!: Tree ring width decreases over time, and patterns, changes follow an oscillation behavior which might suggest external (temperature, humidity) and internal (age, latitude, species) factors affecting tree growth, respectively. It seems that stands follow different patters, perhaps they have different composition, or why not, they can be more or less diverse in vascular species affecting growth. As you can see, the information observed in the graphics can bring us some insights about what is going on with ring trees dynamics.

VASCULAR PLANT DIVERSITY VARIATION ALONG SITES AND SOIL TEMPERATURE

Now, it is time to explore another central concept in biology, Species diversity. At the same time you will be able to understand main statistical concepts very useful to understand how nature works. Considering species richness, we can state the following questions:

A. Can we explain vascular diversity regarding soil temperature?

B. Do stands (sites) are a better predictor than soil temperature?

C. Do we need to consider both variables together to understand vascular plant diversity variation?

Before graphing species richness as function of soil temperature and stands let’s introduce a couple of useful statistical concepts, an introduction to linear regression and how to read a boxplot.

An introduction to linear regression

How to read a boxplot?

1.DATA EXPLORATION

In order to answer our questions, we will use data from two different datasets from a long-term tree and plant surveys in Alberta. We can respond our questions using data from a specific year. In this tutorial we will use 2010 data.Now we can better understand the regression line provided by graphing soil temperature and species richness. It basically follows a negative correlation (more species, less soil temperature). Regarding the boxplot graphic, we can see that stand 5 and 6 contains more species than stand 3 and 4. In stand 7 and 8 we can visualize outlines (extreme values).

Let’s continuous exploring the data

We already know the patterns among sites and a possible pattern of soil temperature. But, what the species richness and soil temperature tell us regarding their frequence distributions? Before going deeper, we can learn or simply refresh the meaning of the Bell Curve (Normal/Gaussian distribution).

The Bell Curve (Normal/Gaussian Distribution)

Counts of the number of species frequencies in each abundance class

Species richness clearly follows a normal distribution.

Counts of the number of soil temperature frequencies in each abundance class

Soil temperature not necessarily follows a normal distribution, but it seems like can assume it.

MODEL CODIFICATION: GET’S STARTED

We can visualize Species richness in function of soil temperature. We can see more species richness is associated to less soil temperature. We can visualize as well the mean species richness associated to each stand and see if they are statistically different. We can see that stand 5 and 6 have more species than site 3 and 4. Can we now conclude with the trends found and publish our results in Nature?

Species richness in function of soil temperature

The relationship found is statically significant (p < 0.02425)

Species richness in function of stands

There are significant differences among stands (p < 2.2e-16)

2. MODEL CODIFICATION: DID WE FORGET SOMETHING?

In order to understand how soil temperature in Celsius and site can affect biodiversity we can create tree different models containing Species richness as response variable (y = dependent variable). Concerning to the dependent variables (axe x) we can model the first model with soil temperature alone, the second one with sites (stands) alone, and the third one with soil temperature and stand together.

# Species richness as function of soil temperature (C)
M1 <- lm(SR ~ temp_C,data = SR_SoilTemp)  
# Species richness as function of stand
M2 <-lm(SR ~ stand,data = SR_SoilTemp)    
# Species richness as function of soil temperature (C) and stand
M3 <- lm(SR ~ temp_C+stand,data = SR_SoilTemp)

3. MODEL SELECTION

Regarding patterns associated to species richness associated to soil temperature and stand, can we use these results to formulate our ecological conclusions? We can compare the results got on the model M1 and M2 with a potential third model. Does putting together both soil temperature and stand can reveal a pattern hidden by modeling both variable independently? We can use AICc approach to select the best model. Here we can see that model M2 and M3 are the best options following a lm approach wirt fixed effects. Let’s model the M3 option to see how stand and soil temperature works together to predict species richness.

##    df    logLik      AICc      delta
## M3  9 -401.3830  821.8570   0.000000
## M2  8 -403.1450  823.1575   1.300477
## M1  3 -503.2308 1012.6019 190.744876

Now, it is time to use our selected model to see how it works species richness in function of soil temperature and stands.

M3 <- lm(SR ~ temp_C+stand,data = SR_SoilTemp)

4. MODEL VALIDATION

In order to achieve assumptions stated in the linear regression theory we can refresh our understanding in the statistical concept of residuals.

4.1. Homogeneity of the variance

Plot predicted values vs residual values

Homogeneous dispersion of the residuals. The assumption is respected!

4.2. Independance of the model residuals

Check the independance of the model residuals with each covariate of the model

Homogeneous dispersion of the residuals around 0 and no pattern of residuals depending on the variable, the assumption is respected!!

4.3. Normality of the model residuals

The residuals follow a normal distribution. The assumption is respected !!!

5. INTERPREPTATION AND VISUALIZATION

EUREKA!!! We have a surprised, soil temperature follows an opposite pattern if we compared this graphic with the trend followed in the model M1. Stand patterns are conserved.

How can we explain this pattern?

We can see that stand is a more important predictor as soil temperature. It is logical that species diversity depends on more environmental variables than only soil temperature. We can see that site is a factor more important than soil temperature alone, perhaps other different factors not consider here can better explain species diversity. Stands 5 and 6 are exceptional sites referring to diversity richness. Once we are here, we can wonder us: do other metrics of diversity (i.e Simpson and Shannon) follow the same pattern that species richness? You can explore this question or even others using a similar approach!

##  SR                                               
##  11 when stand is           4 & temp_C <  20      
##  13 when stand is           3 & temp_C <  18      
##  13 when stand is           4 & temp_C >=       20
##  16 when stand is           3 & temp_C >=       18
##  17 when stand is 1 or 7 or 8 & temp_C is 17 to 18
##  18 when stand is      1 or 8 & temp_C is 15 to 16
##  18 when stand is 1 or 7 or 8 & temp_C >=       19
##  18 when stand is           7 & temp_C is 15 to 16
##  18 when stand is 1 or 7 or 8 & temp_C is 18 to 19
##  19 when stand is 1 or 7 or 8 & temp_C <  15      
##  21 when stand is           5 & temp_C <  19      
##  21 when stand is 1 or 7 or 8 & temp_C is 16 to 17
##  22 when stand is           5 & temp_C >=       19
##  23 when stand is           6 & temp_C >=       17
##  23 when stand is           6 & temp_C <  17

REPRODUCIBILITY

The graphics and results presented in this tutorial were obtained using historical data of soil temperature and vascular diversity datasets from Hondo stands. Data is available at [https://dataverse.scholarsportal.info/dataset.xhtml?persistentId=doi:10.5683/SP3/PZCAVE]. We imported the original datasets from Import dataset in R Studio.

Hondo_VascularCover_1980_2015 # Historical
str(Hondo_VascularCover_1980_2015)
Hondo_SoilTemp_1980_2010 # Historical soil temperature
str(Hondo_SoilTemp_1980_2010)

Dataset manipulation

1.Generate a subset of data considering only 2010 data to simplify the statistical analyses. It is important to focus on the heart of species richness ecological concept connected to space and environment.

Hondo_VascularCover_2010 <- subset(Hondo_VascularCover_1980_2015,year== "2010" )   # Selecting from one category in rows
Hondo_SoilTemp_2010 <- subset(Hondo_SoilTemp_1980_2010,year== "2010" )
  1. Save the 2010 subsets data in the computer to clean it and make it proper to work in R.
write.csv(x=Hondo_VascularCover_2010,file="Hondo_VascularCover_2010.csv", row.names=FALSE) # Export data in csv format
write.csv(x=Hondo_SoilTemp_2010,file="Hondo_SoilTemp_2010.csv", row.names=FALSE) 
  1. Open the 2010 subsets in excel and order both of them by stand and quad, then corroborate the perfect correspondence in order.

  2. Generate a new data frame summarizing stand, quadrant, soil temperature and species richness. You can see here that quadrants and stands were merged adequately.

COMMENTS

This tutorial was created using RStudio and RMarkdown. The entire code to reproduce the results and graphics provided in this tutorial are available in the Living Data working Group project through the GitHub page.

library(ggplot2)
library(multcompView)
library(MuMIn)
library(rpart)
library(rpart.plot)
library(vembedr)

6. REFERENCES

Canadian Endangered Species Conservation Council(CESCC). 2010. Wild Species 2010: The general status of Species in Canada.

Jin, Y., and Qian, H. 2019. V.PhyloMaker: an R package that can generate very large phylogenies for vascular plants. Ecography, 42: 1353: 1359.

Packer, J.G., and Gould, A.J. 2017. Vascular plants of Alberta, part 1: Ferns, Fern Allies, Gymnosperms, and monocots. University of Calgary Press. 281 pages.

Earle, C.J. 2021.The Gymnossperm Database. Consulted on April 7, 2022: [https://www.conifers.org/zz/gymnosperms.php]. Go Botany (3.7). 2022. Native Plant Trust. Consulted on April 7, 2022: [https://gobotany.nativeplanttrust.org]

Fritts, H. C. (1987). TREE-RING ANALYSISTree-ring analysis. In Climatology (pp. 858–875). Springer US. https://doi.org/10.1007/0-387-30749-4_182

Tumajer, J., & Lehejček, J. (2019). Boreal tree-rings are influenced by temperature up to two years prior to their formation: A trade-off between growth and reproduction? Environmental Research Letters, 14(12), 124024. https://doi.org/10.1088/1748-9326/ab5134

NOAA. . Picture Climate: How Can We Learn from Tree Rings? | National Centers for Environmental Information (NCEI) formerly known as National Climatic Data Center (NCDC). (n.d.). Retrieved 8 April 2022, from https://www.ncdc.noaa.gov/news/picture-climate-how-can-we-learn-tree-rings

                                                    FIN